对世界各地的急诊部门(ED)服务的需求不断增长,特别是在Covid-19大流行下。风险三环在优先考虑最需要它们的患者的有限医疗资源方面发挥着至关重要的作用。最近,普遍使用电子健康记录(EHR)已经产生了大量的存储数据,伴随着开发可改善紧急护理的预测模型的巨大机会。然而,没有基于大型公共EHR的广泛接受的ED基准,这是新的研究人员可以轻松访问的基准。填补这种差距的成功可以使研究人员更快,方便地开始研究,而无需详细数据预处理,并促进不同研究和方法之间的比较。在本文中,基于医疗信息MART为重症监护IV急诊部门(MIMIC-IV-ED)数据库,我们提出了一款公共ED基准套件,并获得了从2011年到2019年的50万ED访问的基准数据集。三个ed已经介绍了基于预测任务(住院,关键结果和72小时ED Revisit),其中实施了各种流行的方法,从机器学习方法到临床评分系统进行了实施。他们的性能结果评估并进行了比较。我们的代码是开源,因此任何具有访问模仿-IV-ED的人都可以遵循相同的数据处理步骤,构建基准,并重现实验。本研究提供了洞察力,建议,以及未来研究人员的协议,以处理原始数据并快速建立紧急护理模型。
translated by 谷歌翻译
We describe a Physics-Informed Neural Network (PINN) that simulates the flow induced by the astronomical tide in a synthetic port channel, with dimensions based on the Santos - S\~ao Vicente - Bertioga Estuarine System. PINN models aim to combine the knowledge of physical systems and data-driven machine learning models. This is done by training a neural network to minimize the residuals of the governing equations in sample points. In this work, our flow is governed by the Navier-Stokes equations with some approximations. There are two main novelties in this paper. First, we design our model to assume that the flow is periodic in time, which is not feasible in conventional simulation methods. Second, we evaluate the benefit of resampling the function evaluation points during training, which has a near zero computational cost and has been verified to improve the final model, especially for small batch sizes. Finally, we discuss some limitations of the approximations used in the Navier-Stokes equations regarding the modeling of turbulence and how it interacts with PINNs.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
研究人员通常会采用数值方法来理解和预测海洋动力学,这是掌握环境现象的关键任务。在地形图很复杂,有关基础过程的知识不完整或应用程序至关重要的情况下,此类方法可能不适合。另一方面,如果观察到海洋动力学,则可以通过最近的机器学习方法来利用它们。在本文中,我们描述了一种数据驱动的方法,可以预测环境变量,例如巴西东南海岸的Santos-Sao Vicente-Bertioga estuarine系统的当前速度和海面高度。我们的模型通过连接最新的序列模型(LSTM和Transformers)以及关系模型(图神经网络)来利用时间和空间归纳偏见,以学习时间特征和空间特征,观察站点之间共享的关系。我们将结果与桑托斯运营预测系统(SOFS)进行比较。实验表明,我们的模型可以实现更好的结果,同时保持灵活性和很少的领域知识依赖性。
translated by 谷歌翻译
In the era of noisy intermediate scale quantum devices, variational quantum circuits (VQCs) are currently one of the main strategies for building quantum machine learning models. These models are made up of a quantum part and a classical part. The quantum part is given by a parametrization $U$, which, in general, is obtained from the product of different quantum gates. By its turn, the classical part corresponds to an optimizer that updates the parameters of $U$ in order to minimize a cost function $C$. However, despite the many applications of VQCs, there are still questions to be answered, such as for example: What is the best sequence of gates to be used? How to optimize their parameters? Which cost function to use? How the architecture of the quantum chips influences the final results? In this article, we focus on answering the last question. We will show that, in general, the cost function will tend to a typical average value the closer the parameterization used is from a $2$-design. Therefore, the closer this parameterization is to a $2$-design, the less the result of the quantum neural network model will depend on its parametrization. As a consequence, we can use the own architecture of the quantum chips to defined the VQC parametrization, avoiding the use of additional swap gates and thus diminishing the VQC depth and the associated errors.
translated by 谷歌翻译
Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model's sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.\footnote{https://github.com/amorimlb/scaling\_matters}
translated by 谷歌翻译
Language modeling, a central task in natural language processing, involves estimating a probability distribution over strings. In most cases, the estimated distribution sums to 1 over all finite strings. However, in some pathological cases, probability mass can ``leak'' onto the set of infinite sequences. In order to characterize the notion of leakage more precisely, this paper offers a measure-theoretic treatment of language modeling. We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense. We also generalize characterizations of tightness proposed in previous works.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
Topic modeling is widely used for analytically evaluating large collections of textual data. One of the most popular topic techniques is Latent Dirichlet Allocation (LDA), which is flexible and adaptive, but not optimal for e.g. short texts from various domains. We explore how the state-of-the-art BERTopic algorithm performs on short multi-domain text and find that it generalizes better than LDA in terms of topic coherence and diversity. We further analyze the performance of the HDBSCAN clustering algorithm utilized by BERTopic and find that it classifies a majority of the documents as outliers. This crucial, yet overseen problem excludes too many documents from further analysis. When we replace HDBSCAN with k-Means, we achieve similar performance, but without outliers.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译